Two Methodologies Applied to the Author Profiling Task
نویسندگان
چکیده
This paper describes two methodologies applied to the author profiling task submitted to the PAN 2013 competition of the CLEF 2013 conference. The first methodology was applied only to the English language, whereas the second one was executed only over the corpus written in Spanish language. The aim was to evaluate the performance of both methodologies in the above mentioned task. The obtained results were quite positive for the first methodology which considers a classicaly approach of classification, using diverse features extracted from the texts in order to feed a classifier based on random forests. The second methodology, based on graph mining techniques, obtained a very poor performance for the author profiling task. 1 Description of the Methodologies Evaluated We applied two different methodologies, one for each language. For the English corpus, we employed machine learning techniques with different sets of features. The description of this first methodology is presented in Section 1.1. The Spanish corpus was processed with a second methodology based on graph mining techniques. This methodology is described in Section 1.2.
منابع مشابه
A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملSegmenting Target Audiences: Automatic Author Profiling using Tweets: Notebook for PAN at CLEF 2015
This paper describes a methodology proposed for author profiling using natural language processing and machine learning techniques. We used lexical information in the learning process. For those languages without lexicons, we automatically translated them, in order to be able to use this information. Finally, we will discuss how we applied this methodology to the 3rd Author Profiling Task at PA...
متن کاملUniNE at CLEF 2017: TF-IDF and Deep-Learning for Author Profiling
This paper describes and evaluates a strategy for author profiling using TF-IDF and a Deep-Learning model based on Convolutional Neural Networks. We applied this strategy to the author profiling task of the PAN17 challenge and show that it can be applied to different languages (English, Spanish, Portuguese and Arabic). As features, we suggest using a simple cleaning method for both models, and ...
متن کاملAuthor Profile Prediction Using Trend and Word Frequency Based Analysis in Text
PAN 2017 Author Profiling task include two target predictions, one is to predict the gender of text authors and second is to predict the language variety. The presented approach analyzed trends and topics followed in training dataset e.g. Authors discussing Politics, Tech, Religion, Nature etc. in their respective tweets. Along with that single words and word pair frequencies were also taken in...
متن کاملAuthor's Traits Prediction on Twitter Data using Content Based Approach
This paper describes the methods we have employed to solve the author profiling task at PAN-2015. The proposed system is based on simple content based features to identify an author’s age, gender and other personality traits. The problem of author profiling was treated as a supervised machine learning task. First content based features were extracted from the text and then different machine lea...
متن کامل